Recognizing voice over IP: a robust front-end for speech recognition on the world wide web
نویسندگان
چکیده
The Internet Protocol (IP) environment poses two relevant sources of distortion to the speech recognition problem: lossy speech coding and packet loss. In this paper, we propose a new front-end for speech recognition over IP networks. Specifically, we suggest extracting the recognition feature vectors directly from the encoded speech (i.e., the bit stream) instead of decoding it and subsequently extracting the feature vectors. This approach offers two significant benefits. First, the recognition system is only affected by the quantization distortion of the spectral envelope. Thus, we are avoiding the influence of other sources of distortion due the encoding-decoding process. Second, when packet loss occurs, our front-end becomes more effective since it is not constrained to the error handling mechanism of the codec. We have considered the ITU G.723.1 standard codec, which is one of the most preponderant coding algorithms in voice over IP (VoIP) and compared the proposed front-end with the conventional approach in two automatic speech recognition (ASR) tasks, namely, speaker-independent isolated digit recognition and speaker-independent continuous speech recognition. In general, our approach outperforms the conventional procedure, for a variety of simulated packet loss rates. Furthermore, the improvement is higher as network conditions worsen.
منابع مشابه
Evaluation and optimization of noise robust front-end technologies for the automatic recognition of Hungarian telephone speech
In this paper a variety of front-end configurations are evaluated on Hungarian telephone speech databases. Our aim was to measure directly the efficiency of the front-ends on real noisy and normal speech data. As a baseline the ETSI ADSR standard front-end is used. Some simplification on the standard is introduced resulting in better performance on our databases than the original front-end in t...
متن کاملQuantization of cepstral parameters for speech recognition over the World Wide Web
We examine alternative architectures for a client-server model of speech-enabled applications over the World Wide Web. We compare a server-only processing model, where the client encodes and transmits the speech signal to the server, to a model where the recognition front end runs locally at the client and encodes and transmits the cepstral coefficients to the recognition server over the Intern...
متن کاملR@7à3spgp3à7vh7pae7fr3dà8p3e7ugpcà8gpàr@7àh7p8gpe3f57 7t3ds3ragfàg8àqh775@àp75g9faragfàqwqr7eqàsf67pàfgaqw 5gf6aragfq @ex9¼xià@sgr
This paper describes a database designed to evaluate the performance of speech recognition algorithms in noisy conditions. The database may either be used for the evaluation of front-end feature extraction algorithms using a defined HMM recognition back-end or complete recognition systems. The source speech for this database is the TIdigits, consisting of connected digits task spoken by America...
متن کاملA study of mutual front-end processing method based on statistical model for noise robust speech recognition
This paper addresses robust front-end processing for automatic speech recognition (ASR) in noise. Accurate recognition of corrupted speech requires noise robust front-end processing, e.g., voice activity detection (VAD) and noise suppression (NS). Typically, VAD and NS are combined as one-way processing, and are developed independently. However, VAD and NS should not be assumed to be independen...
متن کاملNoise robust front-end processing with voice activity detection based on periodic to aperiodic component ratio
This paper proposes a front-end processing method for automatic speech recognition (ASR) that employs a voice activity detection (VAD) method based on the periodic to aperiodic component ratio (PAR). The proposed VAD method is called PARADE (PAR based Activity DEtection). By considering the powers of the periodic and aperiodic components of the observed signals simultaneously, PARADE can detect...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- IEEE Trans. Multimedia
دوره 3 شماره
صفحات -
تاریخ انتشار 2001